home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SPACE 2
/
SPACE - Library 2 - Volume 1.iso
/
apps
/
386
/
doc
/
stat3.doc
< prev
next >
Wrap
Text File
|
1989-06-04
|
8KB
|
146 lines
STATS 3 MENU
REGRESSION
For the tests that follow, all except LOGIT regression have
similar input and output structures. You will be asked for the
variables that are the independent variables and for the one
dependent variable. You will then be asked for the variable
(column) into which the calculated values should be placed. The
program does not place the residuals in variable (column) a, as
this would restrict the number of variables which could actually
be used in the regression. To get the residuals, simply subtract
the calculated data from the actual in the data editor. The
differences lie in additional parts of the regressions.
-Multiple regression is a traditional regression.
-Ridge regression will require the entry of a ridge factor, which
should be small and between 0 and 1 (most often below .2).
-Stepwise regression is like multiple regression, except that you
specify all independent variables to be considered. The program
decides on which of these to actually use in the regression.
-Cochran refers to a regression done using the Cochran-Orcutt
procedure. A "Cochran" factor of between 0 and 1 must be used.
This type of regression actually uses a part of the previous point
in the calculation. If the Cochran factor is 1, then the
regression is actually calculated upon the first differences of
the variables.
-Huber regression is used to reduce the weight given to outliers
in the data. You will need to specify two additional pieces of
data. The first is the variable into which the program places the
weights, and the second is the value of the residual at which the
weights should start to be changed. This procedure can only be
used after first doing a traditional regression.
-Weighted regression requires you to specify a weight variable
before execution.
-Chow regression is a simple modification of multiple regression.
It is used to see if the regression parameters are constant over
the scope of the data variables. You will have to specify the
number of points to keep in the first sample.
-LOGIT regression is used when the dependent variable is to be
constrained to a value above 0 but below 1. LOGIT setup converts
unsummarized data to the form required by the regression program.
(Save original data first!)
-Principle Components is not actually a regression method at all.
It is a process used to reduce the number of variables needed to
explain the variation in the data. The resultant variables are
orthogonal; that is the correlation between any two variables is
0. Regression can often then be carried out against these pseudo-
variables. The process is destructive, in that it wipes out the
existing variables. Each new one is a linear combination of the
others.
-Correlation matrix shows the correlation between a group of
variables, rather than doing a full regression. This is often done
to look at the effects of multi-colinearity on the data.
TIME SERIES
These are methods of smoothing or projecting data. They are often
used in combination with other procedures.
-Moving average requires you to choose the variable and the period
of the moving average. As well, you must select a variable into
which the averaged variable will be placed.
-Geometric moving average requireS the same input as linear moving
average.
-Fourier smoothing requires a variable to smooth and a variable to
place the result. It also asks for the number of terms to be kept
in the intermediate calculations. This value should be less than
50, usually lesS than 15. There must be no missing data for this
procedure to work. Note that this can be a slow process.
-Brown 1-way exponential smoothing is simple exponential
smoothing. You will be asked to specify the variable to smooth,
and a variable in which to store the result. In addition, you will
need a smoothing constant (0 to 1) and a starting value. If you do
not specify the starting value, the program will generate one.
This process is not designed for data with a distinct trend line.
If there is a distinct linear trend, then 2-way exponential
smoothing should be used.
-Brown's 2-way exponential smoothing uses linear regression to
estimate a starting value and trend. You must estimate the
smoothing coefficient and variable to smooth, and variable for
result.
-Holt's 2-way exponential smoothing is similar to Brown's, except
that a separate smoothing coefficient is used for the trend
factor.
-Winter's exponential smoothing is used if there is a seasonal
aspect to the data (like retail sales which have a December peak).
You will have to enter 4 quantities. The first is the smoothing
coefficient for level. The second is for trend. The third is for
seasonality. The fourth value is the period of seasonality. Note
that this method should not be used with data fluctuating above
and below zero. With data that go below zero, add a constant to
the data to eliminate negative values. Then, after smoothing,
subtract the constant.
-Interpolation
B/STAT uses 3 forms of estimating unavailable data.
-Simple linear interpolation requires that you simply select the
variable.
-Lagrangian interpolation requires two variables: an "X" variable
and a "Y" variable. There can be no missing "X" variables. This
can be slow with a large data set, since each point is used in
estimating missing data.
-Cubic splines assumes that the data set in the selected variable
consists of evenly-spaced observations.
EXTRACT
These selections allow you to reduce the size of the data set. The
first option sums the data. For example, if you want to get yearly
totals from a data set of monthly data, you can extract summed data
and reduce the data by a factor of 12. Each element would then be
a yearly total. In the non-summed case, only every 12th value would
be left. No summing would be done. This is useful if you want to
look at subsets in isolation.
MISCELLANEOUS
This menu has two procedures, in addition to the usual help
selection.
-Crosstabs is used to summarize data which contained in two or
three variables. It produces a count for the combination of values
in the chosen variables. For example, you may have data on the
height and weight of a group of army recruits. You could use
crosstabs to find out the number in each height and weight
classification, where these could be height in 2-inch increments
and weight in 5-pound increments. It is most commonly used in
market research for crosses, such as between age 30 and 34 and
earning between 20,000 and 30,000 dollars per year.
You first select the variables to use in the crosstab. If you
select two, then a 2-way crosstab is done. If three, then a 3-way
crosstab is done. Next, you select the break points for the
classes in each variable. There may be up to 14 breakpoints,
giving a maximum of 15 classes for each variable. You need only
type in as many breakpoints as there are in the a specific
variable, and leave the rest blank. The number of break points can
be different for each variable. Note that the lower class includes
the break point value. Thus, a breakpoint of 200 pounds would put
200-pound people in the lower class and 200.01 pound people in the
higher class. The program will print out the results. If you want,
you may replace the data in memory with the summarized totals.
This can be quite useful if you then want to perform a Chi square
test, type 2, on the result to see if there are any significant
relationships.
-Difference is a rather simple process. The difference of a
variable is simply the amount of its change from one period to the
next. Sometimes some procedures will work better on the change in
a variable rather than the variable itself. This is especially
true in Box Jenkins analysis. You merely supply the variable to
difference and the variable into which to place the result.